Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataTree.isel and DataTree.sel #9588

Merged
merged 7 commits into from
Oct 10, 2024
Merged

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Oct 7, 2024

These methods handle dimensions that are not missing on some nodes properly, as discussed in #8949

Adding additional indexing methods (e.g., head, tail, thin, interp, reindex), should be pretty easy to do in this style.

  • Tests added
  • New functions/methods are listed in api.rst

@shoyer shoyer requested a review from TomNicholas October 7, 2024 11:47
@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Oct 7, 2024
Comment on lines 1633 to 1635
for k in node_indexers:
if k not in node._node_coord_variables and k in node_result.coords:
del node_result.coords[k]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is basically saying "if the result of indexing would cause an inherited coordinate to be duplicated, then remove that duplicated coordinate from the result"?

Doesn't this imply an inefficiency, in that we are potentially doing the work to index coordinates that we're only going to drop afterwards?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to explain. It is indeed a small source of inefficiency.

xarray/core/datatree.py Show resolved Hide resolved
xarray/core/datatree.py Outdated Show resolved Hide resolved
Comment on lines 1667 to 1668
# TODO: reimplement in terms of map_index_queries(), to avoid
# redundant index look-ups on child nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to my comment above or a separate issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a separate issue.

Comment on lines +1605 to +1612
expected = DataTree.from_dict(
{
"/": xr.Dataset(coords={"x": 2}),
"/child": xr.Dataset({"foo": 4}),
}
)
actual = tree.isel(x=-1)
assert_equal(actual, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this passes the x coordinate gets duplicated doesn't it? As the result is a non-indexed scalar. Does that mean this would have failed if you had used the definition of assert_identical from #9473 to check for duplication?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this should also pass

actual = tree.isel(x=[0])
assert_identical(actual, expected)

because the result will have inherited instead of duplicated coordinates instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote special logic for removing duplicated scalar coordinates to handle just this!

There is no longer a need to implement a different version of inheritance for assert_identical than for assert_equal, because we no longer allow any cases in the DataTree data model where inheritance is ambiguous:

  • Deduplication of coordinates with an index removes them from child nodes.
  • Other coordinates are no longer inherited.

xarray/core/datatree.py Show resolved Hide resolved
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanations! The comments definitely make it easier to understand the subtleties.

@shoyer shoyer merged commit c057d13 into pydata:main Oct 10, 2024
29 checks passed
@shoyer shoyer deleted the datatree-indexing branch October 10, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

2 participants